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The correlation between a random sequence and its transformed sequences 
is studied. In the case of a permutation operation or, in other word, the 
shuffling operation, it is shown that the correlation can be so small that 
the sequences can be regarded as independent random sequences. The ap- 
plications to the Monte Carlo simulations are also given. This method is 
especially useful in the Ising Monte Carlo simulation. 
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1 Introduction 



Random numbers are indispensable to the Monte Carlo simulation, which is one 
of the most important applications of supercomputers. The recent development 
of vector computers makes more accurate Monte Carlo simulation possible. The 
Ising Monte Carlo simulation is the most highly developed application and the 
precision |jj ~~ Q now reaches to the order of 1CP 6 and sometimes such simulations 
consume of the order of 10 14 random numbers. 

When a high-precision Monte Carlo simulation is made, two points are impor- 
tant. One is the use of correct pseudo-random number. It is a quite reasonable 
requirement. And the other is the efficient use of random numbers. For example, 
the fastest Monte Carlo algorithm using the single-spin updating dynamics of the 
Ising model [p 7 ] - uses one random number for updating several spins which are 
coded in a computer word(independent-system coding technique)^" jg]. The spins 
in a word belong to physically independent systems and therefore it is expected that 
such a technique does not make a problem. The validity of this algorithm relics on 
the physical independence of the simulated systems. When we want to know the 
expectation values at many different temperatures or fields, this kind of efficient use 
of random number is very helpful and it has been used in many studies |[ |l2| ~ 0] . 
But if we want to know the expectation value at one temperature, this algorithm 
was not useful. To overcome this difficulty, the random shuffling of the Boltzmann 
factor tables(BFT, see Section 6) has been proposed^. Although it seemed to work, 
the validity and/or the limitation of such an operation has not been made clear. 

If we transform a given sequence of random numbers, {r„; n = 0, 1, 2, 3, • ■ ■}, 
using some map / from interval I = [0, I] to I, how does the new sequence {/(r n ); 
n = 0, I, 2, 3, • • •} behave? If f(x) ~ x 2 or some other smooth analytic function 
is used, the sequences {r n } and {/(r„)} will be strongly correlated. Intuitively if 
we use f{x) which maps any real number in I into / randomly, the sequence {r n } 
and {f(r n )} may be independent random sequences. On the digital computer, the 
reliable random real numbers distributed between and 1 are usually generated by 
normalizing random integers which are generated by a recursion relation p5f ~ ]2?| . 
When the random integer is distributed uniformly between and N — 1, the above 
mentioned map / can be regarded as a map from In = {0, 1,2, • • •, N — 1} to In- 
This kind of discrete random number is treated in this paper. 

The purpose of the present paper is to study the behavior of random sequences 
under transformations and to clarify whether there is room for improvement in Monte 
Carlo simulation in terms of this transformation strategy. And a new algorithm 
named recycle algorithm is proposed. This algorithm makes the meaning, validity 
and limitation of the BFT shuffling operation clear. 

Some relevant lemmas about the random sequence transformation are given in the 
next section. The algorithm of random permutation generation which is important 
in the application of our algorithm is studied in the third section. The general study 
of the efficiency of our algorithm is made in the fourth section. In the fifth section, a 
simple application to the static Monte Carlo integration is shown. The application 
to the dynamic Monte Carlo simulation of the Ising model is given in the sixth 
section. The final section is for the conclusion and remarks. 
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2 Some Lemmas 



In this section, some useful lemmas are given. Their proofs are not given but they 
are easily proved. 

Notation 

if is a field ( for example, real field or complex field) and is any set with N 
elements, where N is a positive integer. Tn denotes the set of all the maps from fl 
to ft. There are N N maps in Tn- The subset of Tn made of injections are denoted 
by Sn, which is naturally regarded as the symmetric group of order N. Fj^ denotes 
the set of all the maps from ft M to K . Jot denotes the composition of t € Tn and 
/ G F^. In the following lemmas, N is assumed to be larger than 2M. 

Although the lemmas below are proved under above-mentioned general situa- 
tions, we are interested in the application to the random sequence and Monte Carlo 
simulations. In these cases, the real field is usually used as K. And ft corresponds 
to the possible values of random number, that is, 

ft = {^;i = 0,l,2,---,AT-l}. (1) 

Instead of continuous random numbers, discrete random numbers are considered 
and it is appropriate because the pseudo-random number generated by a digital 
computer usually takes such discrete values. 

Definition 

For any / G F^, the average of /, is defined by 

ImU] = ^m Yl /(wi,W2,w 3 ,-",wm)- (2) 



Lemma 1. 

If Ii\f o t] — I^[f] for a given t G Tn and any / <E F^ , t is an element of Sn- 
Inversely, if t is an element of Sn, ° t] = I\[f] for any / in F± . 

This Lemma 1 suggests that the permutation operation is better than other 
operations in Tn to transform the random sequences into other random sequences. 
If a permutation is used, the averaged value of / is not biased at all. If we use the 
permutation, there is no deviation of this kind even for M variable function: 

Lemma 2. 

For any s G Sn and any / G F^, 

lZ\f°s]=I&\f\- (3) 

Therefore 

<A^[/o,s] 2 > seSjv =0, (4) 
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where < • > a eA denotes the average over a set A and 

< Al£[/ o s ] 2 > seSN =< I»[f o s ] 2 >seSN - < /£[/ o a] > 2 seSn . (5) 



But if an other transformation is used, a deviation of the order oil/y/N appears 
as shown in the following Lemma 3: 



Lemma 3. 



For any / e F» 



<I&[f°t]> t&rs =im (6) 
and 

< M^[fot? > teTN = ±I&[Af% (7) 

where 

/£[A/ 2 ]=l£[/ 2 ]-/£[/] 2 . (8) 

The following Lemma 4 shows that the correlation between a random sequence 
and the sequences obtained by its permutation becomes small when the resolution 
of the random number 1/N becomes high. 



Lemma 4 

For any s £ Sn and any /, g G F^, 



and 



where 



< /£[/ • <? o S ] > seSiV = + o(^) (9) 



< A/$[/ • ff o S ] 2 > seSjv = ^ + O(l), (10) 

C/, s = £$£[/] - dZmVWM - (ii) 

i>j 
M M 

E f, a = E£( J M a - ^M.ijt/])^^] 2 - w^^]), (12) 

i=l J=l 

D M,»,j[/1 = jyM — 1 £ /( W 1^2,W 3 ,---,W M ) (13) 



and 



^M,»,j[/] = jyaM-i £ /( w i> w 2,o;3,---,wm)/K,^,W3,---,Wm)- (I 4 ) 



The summation in eq. ( ^3[ ) runs over all the elements (wi, W2, ^3, ■ ■ •, wm) G ^ M 
which satisfy the conditions that u>i — ujj. The summation in eq. ( p"4| ) runs over all 
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the elements (u>i, 0J2, 0J3, ■ ■ ■, u>m) € Cl M and (u>[, uj' 2 , uj' 3 , ■ ■ ■, uj' m ) G Cl M which 
satisfy the conditions that uji = uj'j . 

The lemma 1, 2 and 4 suggests a new algorithm of random number generation 
and usage: 

Algorithm(Recycle Use of Random Number) 

From a sequence of random integers, a new random sequence is generated by 
applying permutation operations which are selected randomly. 

The efficiency analysis and real implementations of this algorithm are given in 
the rest sections. 

Here one remark should be made. When a fixed permutation s € Sn is applied 
to a random number repetitively, the randomness of the obtained sequence is worse 
than the sequence obtained by the above algorithm as is shown in the Lemma 5: 

Definition 

For any / e Ff* , s e Sn and lu e CI, J N is defined by 

L-l 

J N [f;s,u>]= lim T £(/°s fe )M. (15) 



Lemma 5. 

X M 



For any / <G F^ and uj G CI, 



and 



where 



< J N [f; s,co] > seSN = I»\f\ + - I N [f}) + 0(1), (16) 

< J N {f; S ,Lo}> seSNi0jm =lF{f} (17) 
< AJ N {f; S ,cof > seSN ,„en= ^/f[A/ 2 ] + 0(1), (18) 



PN = J2- k - (19) 



fe=i 

Note that the pn behaves as Oe + log AT when is large, where Oe is Euler's 



constant. Therefore the deviation is the order of ^/log N/N. 



3 Permutation Generation 

In the previous section, it is shown that a random sequence and its transformed 
sequence by permutation are almost independent with each other. Now the question 
is whether this recycle algorithm of a random sequence improves the efficiency of 
the real simulation or not. 
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In the real application, the random integers uniformly distributed between to 
2™ — 1 are easily generated and such a random integer is to be transformed by one 
table-lookup. For example, such random integers are generated by Tausworthc's 
methodp^]~p8[. The table holds a permutation of integers between to 2" — 1. 
When 20 bit random integer is used, that is, n = 20, the necessary memory storage 
for one such table is 4 byte x2 20 = 4 Mbyte. This is not a large table on the modern 
computers. 

The use of discrete random integers instead of real random numbers causes a 
correlation of the order of 2~™/ 2 as it is shown in Lemma 4. Therefore we have 
to analyze the efficiency carefully. For that purpose, it is necessary to know how 
to generate a permutation randomly. Random numbers are necessary to prepare a 
permutation randomly and it might be possible that the total amount of necessary 
random numbers was larger. Therefore the permutation generation methods are 
studied in this section to clarify the efficiency of the recycle algorithm. 

Consider N objects which are in the boxes, Aq, A\, A2, • • •, An -2 and A^-i at 
the beginning. Now we want to shuffle the objects and generate a new order which 
is independent of the original order. This is called the shuffling problem. 

This problem is easily solved by the following algorithm: Firstly, all the objects 
are put out of the boxes. Then select one object randomly and put it in Aq. And 
then select one object from the remains randomly and put it in A\ and so on. An 
naive implementation is the following FORTRAN subroutine: 

SUBROUTINE PERM1 (N , ILIST) 
IMPLICIT REAL*8 (A-H.O-Z) 
DIMENSION ILIST(0:N-1) 
DO 10 I=N-1,0,-1 

J=INT (RANDOM ( ) *DFL0AT ( 1+1 ) ) 

ITMP=ILIST(I) 

ILIST(I)=ILIST(J) 

ILIST(J)=ITMP 
10 CONTINUE 
RETURN 
END 

The function RANDOM () is assumed to generate one random real number uniformly 
distributed in the interval, I = {r;0 < r < 1}. 

Above program is, however, dangerous. We expect that the J generated by 
J=INT (RANDOM ( ) *DFL0AT ( 1+1 ) ) are distributed uniformly on {0, 1, 2, • • •, 1} but 
it is not always correct, especially when I is large. This is because the random 
real numbers generated by a random number generation routine take only discrete 
values. For example, if the random real number takes one of the values in {0, 0.1, 
0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9} uniformly and 1=7, the probabilities that J=0, 1, 
2, 3, 4, 5, 6 and 7 are 1/5, 1/10, 1/10, 1/10, 1/5, 1/10, 1/10 and 1/10, respectively. 
Of course, if the random number generation routine generates random real numbers 
which take all real*8 floating precision between and 1, such non- uniform property 
might be negligible. 

The above mentioned problem is removed in the following routine PERM2: 
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SUBROUTINE PERM2 (IB , N , ILIST) 
IMPLICIT REAL*8 (A-H.O-Z) 
DIMENSION ILIST(0:N-1) 

IF((2**(IB-1) . GE.N) .OR. (2**IB . LT . N) ) THEN 

WRITE(*,*) 'ERROR! IN PERM2 . SPECIFIED IB=',IB, ' AND N=' ,N 

WRITEO,*)' ARE NOT IN PROPER RANGE.' 

STOP 
END IF 

DO 10 1=1, IB 
IN=2**(IB-I) 
IS=1-I 

IFIRST=IN*2-1 
IF(I.EQ.1)IFIRST=N-1 
DO 20 J=IFIRST,IN,-1 
30 K=ISHFT(IRNDTW() ,IS) 

IF(K.GT. J) GOTO 30 

c 

ITMP=ILIST(K) 
ILIST(K)=ILIST(J) 
ILIST(J)=ITMP 
20 CONTINUE 
10 CONTINUE 
RETURN 
END 

The function IRNDTWO is assumed to generate random integers between to 2**IB-1 
(See Appendix A). The values of IB and N must satisfy the relation 

2 IB ~ 1 < N < 2 IB . (20) 

Two different implementations of shuffling routines are given here. The necessary 
number of random numbers for shuffling N elements, which is denoted by nn(N), are 
estimated here. In the first subroutine PERM1, nji(N) is N. In the second subroutine 
PERM2, tir(N) depends on the values of generated random numbers. About one half 
of the generated random numbers are not used. In average, ur(N) is 2N but there 
is some fluctuation. The following will be an overestimate of it: 

(N + 2Vn) + (^ + 2^) + (^ + 2^) + --- = 2N+^^Vn. (21) 

Therefore n R (N) = 3N is an overestimated value and it is a safe estimate. 

The shuffling algorithm implemented in the PERM2 routine is used in the following 
analysis. And we can say that the random number cost for the shuffling of N elements 
is 3N in the worst case and this nn(N) — 3N is used in the next section for the 
efficiency analysis of our recycle method. 
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4 Efficiency 



The Monte Carlo estimation of an M-dimcnsional integration: 



px™ ax /-a;™ ax «™ ax rx™? x 



(xi,x 2 ,x 3 , • • • ,XM)dxidx2<lxz • • -dXM 

(22) 

is considered to study the efficiency of our recycle method. The dynamic Monte 
Carlo simulation will correspond to the very large M case. After rescalings, = 
(Xi - x™ in )/(x™ ax - x™ in ) (i = 1, 2, 3, • • ■, M), this integration reduces to the 
estimation of 

I [f] = / / •••/ .f(yi,y2,y3,---,yM)dyidy 2 dy 3 ---dy M - (23) 
Jo Jo Jo Jo 

Instead of this integral, the discrete approximant I^if] is estimated by a Monte 
Carlo simulation. This [/] is expected to converge to /[/] when N goes to infinity. 

Before the Monte Carlo sampling is started, itabio permutation tables are pre- 
pared and it requires nn(N)L tah i c random numbers. The permutation operation in 
the i-th table is expressed by Sj (i = 1, 2, 3, • • •, liable)- Then a set of M random 
numbers r 1; r 2 , r 3 , • • •, which is denoted by re £1 M is generated and the value 
of f(r) is accumulated. The average of L samp i c samples is denoted by E ({ri'. i = l, 
2, 3, isampio})- At the same time, the values of /(sj(ri), Si(r 2 ), Sj(r 3 ), 
Si(rM)) {i — 1) 2, 3, • • •, Ltabic) which is denoted by f(si(r}) are also accumulated. 
The averages of L S am P ie samples are denoted by Ei({fi}). 

Each of these estimations Ei({fj}) (i = 0, 1, 2, • • ■, Ltabie) converges to I^if] as 
is shown in lemma 3 and the error defined by 

5 i ({f j })=E i ({f j })-im (24) 



converges as 

< ^({r,}) >f 1 ,f 2 ,?3,-, r W^"= ~T— 7 mI A / 2 ]- (25) 

-^sample 

Therefore the magnitude of the error is 

(26) 

y ^sample 

Now the behavior of the averages of Ei({fj}) (i = 1, 2, 3, • • ■, L ta bic), that is, 

-i Stable 

Sau({*i},W}) = 7 E Wi» (27) 

Stable r—f 

is studied. It is obvious that this -E a n converges to I^lf] when L sam pio — > oo even 
if itabio is finite. But is it true when Ltabie goes to oo and L S ampie is finite? The 
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question is how this quantity converges towards Ij^lf]- And the answer is: The error 
of the -Eaii defined by 



fa 1 }) = E*x({3ih {r 3 }) - iZif] (28) 



behaves as 



< < M{sJ,{^}) 2 >fx,f 2> r 3 ,...A Bample en« ) sl>S2)S3 ,... lSita 

^tablc-t^samDlo i» ^saniDlc JV iV 



stable -^sample -tV -^sample 

It is concluded from this eq. ( p9| ) that the error does not converge to zero when 
-Stable - ► and Sample is finite. From eq. (p9|), we can derive the condition under 
which the Monte Carlo simulation with the recycle algorithm is advantageous to the 
standard algorithm. 

The number of necessary random numbers to achieve a given error 5 is denoted 
by As(S) in the standard algorithm case and Ar(6) in the recycle algorithm case. 
The Ag (5) is obtained by solving the equation the right-hand side of eq. (|26| ) be 5: 



A S (S)/M 



= 5. (30) 



Therefore 



AsW = l£^£ (31) 

To get the Ar(6), we have to optimize the number of tables, L ta bie- When we use 
Stable tables, the accuracy is obtained from eq. ( p9| ) and 

A R (S) = ML 

sample Y^R 

(N)L tahlc (32) 

When one random permutation tabic is generated, nn(N) random numbers are 
necessary as studied in the previous section. After we minimize the Ar(S) under the 
condition that the error be S, we reach the expression 



when 



M '7-JVr A «i i 9hL 



^=w^ (J " [Am * } - (34) 



Therefore the recycle algorithm is efficient when 



in terms of the number of random numbers. Generally speaking, I^j[Af 2 ]/Cfj is 
usually much larger than one as is observed in the next section as an example and 
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(5 • N should be larger than one. After neglecting irrelevant terms, above condition 
reduces to 

I mi»[ap] 

V ^n R {N) > X > (36) 

which is usually satisfied in the high accuracy simulation case. Furthermore the 
optimal number of tables is 



Ltahlc V S 2 n R (N) ' (3?) 

which is a large number also in the high accuracy simulation case. 

These efficiency studies here show that our recycle algorithm in some cases is 
more efficient than the standard algorithm, ft is especially efficient for high accuracy 
Monte Carlo simulations. 

Only the number of random numbers was considered to study the efficiency in 
this section and there will be many other factors to determine the computational 
time in the real computer simulation. So we have to study the real efficiency and it 
is made in the next section and the result proves that the recycle algorithm is really 
efficient in high accuracy simulations. 



5 Simple Application 

In this section, an example is given to show that our recycle algorithm really re- 
duces the computational time without introducing any bias to the estimate and 
the lemma 4 and the recycle algorithm in section 2 works successfully. The Monte 
Carlo estimation of the volume of an M-dimensional hypcrsphere is considered in 
the following. 

In this case, the integrand is 

/(^*s,-,*m) = { 2 M if ^ e <1 . 08) 
where < Xi < l(i = 1, 2, 3, • • •, M). The answer is 

(39) 

where Vm denotes the volume of the M-dimensional unit hypersphcrc. The constants 
relevant to the efficiency analysis, I^[Af 2 ] and Cfj, are estimated to be 

J$[A/ 2 ] = (2 M -V M )V M (40) 

and 

Cw = M(, H ^M 2 (41) 

Two Monte Carlo programs are prepared for M — 5 case. In this case, I^[Af 2 } = 
(2 5 - V 5 )V 5 w 140.7 and C fJ = 10(V 5 - V2V 4 ) 2 w 29.41. One docs not use the 
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recycle algorithm and the other one uses it. They are optimized to their algorithms 
and the FORTRAN codes are given in the appendix A. 64 independent samples are 
used to estimate the statistical errors and each sample is an average over 10 3 ~ 
10 7 independent trials. In the recycle algorithm case, 63 random permutations are 
prepared and used. 

The results and performance for the 13 bit- wide random integer case, that is, the 
A = 2 13 = 8192 case are given in Table |l|. The exact value of the integral is 87r 2 /15 = 
5.26378901 • • •. The J est , converges correctly except for the iVtrial = 10 7 case. The 
Atrial = 10 7 simulation is more accurate than the accuracy limit determined by the 
random number bit-width. In this calculation, the accuracy limit is of the order of 

fctalt = ^ = 2 5 - S , (42) 

where B is the bit-width of the random numbers. The calculation reaches this 
accuracy after ./Viimit trials determined by 



A ni 



"limit ) 



(43) 



therefore 



When B = 13, <Wt = 0.0039 and A limit = 9.2 x 10 6 . The above calculation with 
Atrial = 10 7 means totally 6.4 x 10 7 trials which are much longer than An m it. In the 
Atrial = 10 6 case, the estimated accuracy is of the same order as the accuracy limit 
and therefore the results of those cases are not reliable, although they seem to be 
acceptable. 

In Tabic ^|, the result and performance with 16 bit-wide random integers are 
shown. In this case, the accuracy limit is 0.00049 and the maximum number of 
trials is 5.9 x 10 s . The calculations are tried up to A tr i a i = 10 6 . 

The results in Tables [l] and|^ show two things clearly: (l)our recycle algorithm 
works correctly and (2) it accelerates the high-precision and long Monte Carlo sim- 
ulations. 



6 Ising Monte Carlo Simulation 

An example which uses the recycle algorithm naively was shown in the previous 
section and an advanced application is given in this section. It is the Ising Monte 
Carlo simulation with an independent-system coding technique. 

The most efficient algorithm of this problem prepares and uses tables named 
Boltzmann- factor tables^ [?], which convert the value of a random integer into a 
few bits variable which is useful to update the spins coded in one computer word. 
For example, in the case of the ferromagnetic Ising model on a cubic lattice with 
nearest neighbor interaction without external field, two tables 1X1 and 1X2 are used. 
The Ith bits of these tables express the value of two-bit variables for the system 
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Atrial 


algorithm 


^est. 


X 


CPU time 


^ratio 


10 3 


NR 


5.296 ±0.043 


0.75 


0.8 






R 


5.217±0.040 


-1.17 


2.8 


0.3 


10 4 


NR 


5.254 ±0.015 


-0.65 


6 






R 


5.262 ±0.015 


-0.12 


4 


1.5 


10 5 


NR 


5.2648 ±0.0057 


0.18 


57 






R 


5.2674 ±0.0044 


0.82 


20 


2.9 


10 b 


NR 


5.2652 ±0.0014 


1.01 


570 






R 


5.2654 ±0.0017 


0.95 


179 


3.2 


10 7 


NR 


5.26765 ±0.00044 


8.77 


5694 






R 


5.26615 ±0.00049 


4.82 


1764 


3.2 



Tabic 1: The results and performance for 13 bit-wide random number case are 
shown. The column named algorithm shows which program was used. NR means 
the program with conventional random number usage and R means the recycle use 
algorithm for random numbers. The Atrial , lest, and T rat i denote the number of 
trials for one sample, the estimated volume of the five-dimensional unit hypersphere 
and the CPU time ratio of NR and R cases, respectively. 64 samples are used to 
estimate the value of 7 0St . and its error. The SUN Sparc Station 2 was used to 
measure the CPU time. The X denotes (Imc ~ JexactV^MCj where JmC; ^exact arL d 
SImc denote the estimated volume, exact volume and estimated error, respectively. 



Atrial 


algorithm 


^cst. 


X 


CPU time 


^ratio 


10 3 


NR 


5.342 ±0.045 


1.74 


0.8 






R 


5.238 ±0.051 


-0.51 


21 


0.04 


10 4 


NR 


5.249 ±0.015 


-0.99 


6 






R 


5.249 ±0.014 


-1.06 


24 


0.25 


10 5 


NR 


5.2665 ±0.0047 


0.58 


58 






R 


5.2602 ±0.0053 


-0.68 


52 


1.1 


10 B 


NR 


5.2619 ±0.0016 


1.18 


572 






R 


5.2634 ±0.0014 


-0.28 


340 


1.7 



Tabic 2: The results and performance for 16 bit- wide random number case are 
shown. The notations are the same as those in Table. 1. 
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coded in the Ith bit. When the spins in one word are updated, one random integer 
IR is converted into two values 1X1 (IR) and 1X2 (IR) . Each pair of bits in the same 
bit-positions of these IXl(IR) and 1X2 (IR) holds sufficient information about the 
random number to update each Ising spin coded in that bit-position. They are 
used to update the spins and IR is not necessary anymore. The recycle algorithm 
can be implemented by shuffling randomly the bit pairs in 1X1 and 1X2 before the 
spin updating operations. The subroutine BFSHUF for this operation is given in 
Appendix B and the shuffling operation is described in this FORTRAN subroutine. 
The algorithm is essentially the same as PERM2 routine given in the third section. 
The performance of this BFSHUF routine is measured with the NEC SX3/11 in Koln 



university and the MONTE4|29 in Japan Atomic Energy Research Institute, and 
it is given in Tabic ||. The MONTE4 has four processors and the performance 
is measured with one processor of them under single job environment, which is 
appropriate to compare the performance with the single processor machine, NEC 
SX3/11. The performance of the MONTE4 is largely influenced by the other jobs on 
other processors and the performance shown in Table |^ suffers from large fluctuations 
under multi-job environment. Although the time consuming operation of this routine 
is not vectorizable, the Boltzmann factor tables are shuffled within reasonable CPU 
time. This Ising Monte Carlo algorithm with table shuffling is useful because the 
systems at the same temperature can be simulated with the independent-system 
coding technique. 

Fig. [j] shows two sequences of the magnetization of 64 systems at the critical 
temperature and no correlation is observed in this figure. The correlation between 
the magnetization sequences is studied quantitatively here to check the validity of 
our algorithm. The magnetizations from the ith permuted random sequences are 
denoted by 

Mi(t), t= 1,2,3, (45) 

where t denotes the sample number in the temporal order and the magnetization 
is calculated in every constant Monte Carlo steps. We assume that there are n 
sequences, that is, i = 1, 2, 3, • • •, n. The correlation coefficient between samples i 
and j is defined by 

< Mi(t)Mj(t) > t 



J y/<Mi{ty > t <M {ty > t 

where < ■ >t denotes the temporal average, that is, 



(46) 



</(*)>*=-£/(*)• (47) 



The average of Cy over all different samples i and j, 

E 



(48) 



is estimated for several simulations with different parameters. The results are given 
m Table | and it is clearly observed that the correlation between the samples using 
permuted random numbers is negligibly small as is expected from our lemma 4. 
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IRBIT SX3/11 M0NTE4 



10 


0.046 sec. 


0.039 sec. 


11 


0.091 


0.078 


12 


0.19 


0.17 


13 


0.43 


0.37 


14 


1.0 


0.85 


15 


2.2 


2.0 


1G 


4.7 


4.2 


17 


9.8 


8.4 


18 


20 


17 


19 


41 


37 


20 


82 


74 


21 


165 


149 


22 


332 


284 


23 




568 



Table 3: The shuffling performance of the BFSHUF routine given in the Appendix B 
is listed. The IRBIT denotes the bit-width of the random integer and the size of the 
Boltzmann factor tables to be shuffled is 2 IRBIT . The 8 byte integer is specified for 
integer variables and arrays with -w option of the NEC FORTRAN77/SX compiler. 
The CPU time for the shuffling is measured and shown. For small value of IRBIT, the 
BFSHUF routine is called repeatedly so that the time may be the order of ten seconds 
and the averaged CPU time is shown. The IRBIT= 23 performance of SX3/11 is not 
available because of the memory-size limitation. 





IRBIT 


= 16 


IRBIT 


= 18 


IRBIT 


= 20 


m 


c 


X 


c 


X 


c 


X 


10 2 


-0.0011 


-0.92 


0.0002 


0.17 


-0.0009 


-0.75 


10 3 


0.00003 


0.08 


-0.00007 


-0.18 


0.00028 


0.72 


10 4 


0.00016 


1.33 


0.00012 


1.00 


0.00011 


0.92 


10 5 


0.000014 


0.39 


0.000052 


1.44 


0.000053 


1.36 



Table 4: The correlation c between the magnetization sequences from the simulation 
with permuted random numbers is shown. X denotes c/5c, where 6c denotes the 
estimated error of c. The system onallxllxl2 lattice was simulated at the critical 
temperature. The initial 10 4 MCS is skipped. The m denotes the number of samples 
and the magnetization is measured every 50 MCS. Five independent simulations are 
used to estimate the errors. No correlation is observed in these results. 
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Figure 1: The values of magnetizations Mi(t) and M2(t) of two samples with differ- 
ent permutations are plotted. After 10 4 MCS, the magnetizations were calculated 
100 times at every 1000 MCS at the critical temperature. Five independent simula- 
tions were made and therefore there are 500 points. No clear correlation is observed. 
The quantitative analysis of the correlation is given in Table 4. 

7 Conclusion and Remarks 

It is shown that the random permutation or random shuffling of the numbers in the 
interval [0, 1] transforms one uniformly distributed random sequences into another 
statistically almost independent random sequences. Based on this property, a new 
method named recycle algorithm for random number generation and its usage is pro- 
posed. Its efficiency is studied. It is also shown that this algorithm reduces the com- 
putational time of long and high-precision Monte Carlo simulation. This algorithm 
is especially useful in the Ising Monte Carlo simulation with independent-system 
coding technique. It makes it clear that the random shuffling of the Boltzmann fac- 
tor tables Q makes the statistically independent simulation possible for the systems 
with the same parameters using that coding technique. 

This recycle algorithm will be also useful in Monte Carlo simulations with MIMD 
architecture parallel computer. One processor generates random integers and broad- 
casts them to all the other processors. Each processor holds one random permutation 
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and transforms the broadcasted random integers with its table. When the cost for 
broadcasting is smaller than the generation, this method will be useful. 

The automatic optimization of the bit-width of the random numbers and the 
number of permutation tables is possible by using the results of efficiency analysis 
in the fourth section. It will be useful for moderate-scale Monte Carlo simulation. 

In the very large scale dynamics Monte Carlo simulation, this algorithm is often 
useful. The number of tables and bit-width can be determined by the necessary 
accuracy of the simulation and memory capacity of the computer. 
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Appendix A— Program for Hypersphere 

The FORTRAN programs for the Monte Carlo integration of the volume of five- 
dimensional hypersphere used in the sixth section are shown. Two kinds of programs 
are listed in the following. One uses the conventional algorithm and the other uses 
the recycle algorithm. 

Conventional Random Number Use 

The simulation parameters arc specified in two PARAMETER statements at the begin- 
ning. NSIZE and ISAMPL specify the number of trials in one sample and the number 
of samples, respectively. IRBIT and IRSEED specify the bit-width of the random 
number and the initial seed. IRSEED should be an odd integer. The random number 
generation routines arc given at the end of this Appendix A. 
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IMPLICIT REAL* 8 (A-H.O-Z) 
PARAMETER (NSIZE=10000000 , ISAMPL=64) 
PARAMETER (IRBIT=16 , IRSEED=14643557) 
DIMENSION ACC(ISAMPL) 
C INITIALIZATION OF RANDOM NUMBER GENERATION ROUTINE 

CALL INITTW(IRSEED , IRBIT) 
C MONTE CARLO SAMPLING 
DO 10 I=1,ISAMPL 

ACC(I)=SPHAMR(NSIZE,IRBIT) 
10 CONTINUE 
C ERROR ANALYSIS 
AAVE=0 . ODO 
ASQ=O.ODO 
DO 20 I=1,ISAMPL 
AAVE=AAVE+ACC(I) 
ASQ=ASQ+ACC(I)*ACC(I) 
20 CONTINUE 

AAVE=AAVE/DFLO AT (ISAMPL) 
ASQ=ASQ/DFLOAT(ISAMPL) 

AERR=DSQRT( (ASQ-AAVE*AAVE)/DFLOAT (ISAMPL- 1) ) 
C OUTPUT 

WRITE(6,100)NSIZE, ISAMPL, (NSIZE*ISAMPL) 
100 FORMAT (' TRIAL=',I8,' SAMPLE=' , 16 , ' TOTAL TRIAL= ' , 110) 

WRITE(6 , 110) IRBIT , 2 . OdO** (-IRBIT) 
110 FORMAT ( ' BIT WIDTH=' ,12, ' RNG RESOLUTION^ ,F10. 8) 

WRITE (6 , 120) AAVE , AERR 
120 FORMAT ( ' ESTIMATED VOLUME=' ,F10. 8, '+-' ,F10. 8) 

STOP 

END 

FUNCTION SPHAMR(N, IRBIT) 

IMPLICIT REAL* 8 (A-H.O-Z) 

AN0RMS=4.0D0**IRBIT 

ICNT=0 

DO 10 1=1, N 

A 1 =DFL0 AT ( IRNDTW ( ) ) 

A2=DFL0AT ( IRNDTW ( ) ) 

A3=DFL0 AT ( IRNDTW ( ) ) 

A4=DFL0 AT ( IRNDTW ( ) ) 

A5=DFL0AT ( IRNDTW ( ) ) 

A=A1*A1+A2*A2+A3*A3+A4*A4+A5*A5 

IF ( A . LT . ANORMS ) ICNT=ICNT+ 1 
10 CONTINUE 

SPHAMR=DFLOAT ( ICNT) *32 . ODO/DFLOAT (N) 

RETURN 

END 

Recycle Algorithm 

In the following program, the number of trials of one sample is NSIZE x IREP. 
ISAMPL specifies the number of random permutation tables and the original random 
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sequence is also used. Therefore the number of sample is ISAMPL+1. The random 
number generation routine at the end of this Appendix and the subroutine PERM2 in 
the third section arc also necessary. 

IMPLICIT REAL* 8 (A-H.O-Z) 

PARAMETER (IRBIT=16 , NSIZE=1000000 , IREP=10 , ISAMPL=63) 

PARAMETER(IRSEED=14643557) 

PARAMETER (ILIST=2**IRBIT) 

DIMENSION IX(0:ILIST-1,ISAMPL) ,IR(NSIZE,5) 
DIMENSION ACC(0:ISAMPL) 
CALL INITTW(IRSEED , IRBIT) 
DO 10 J=1,ISAMPL 
DO 20 I=0,ILIST-1 
IX(I,J)=I 
20 CONTINUE 

CALL PERM2(IRBIT, ILIST, IX(0, J) ) 
10 CONTINUE 

DO 30 I=0,ISAMPL 
ACC(I)=0.0D0 
30 CONTINUE 

DO 40 I=1,IREP 

CALL ITWDIM(NSIZE*5,IR) 

ACC(0)=ACC(0)+SPHAMC(NSIZE,IR(1,1) ,IR(1,2) , 
1 IR(1,3) ,IR(1,4) ,IR(1,5) , IRBIT) 

DO 50 J=1,ISAMPL 

ACC(J)=ACC(J)+SPHAMD(NSIZE,IR(1,1) ,IR(1,2) , 
1 IR(1 ,3) , IR(1 ,4) , IR(1 ,5) , IRBIT , ILIST, IX (0, J) ) 

50 CONTINUE 
40 CONTINUE 

AAVE=0 . 0D0 

ASQ=0.0D0 

DO 70 I=0,ISAMPL 

ACC ( I ) = ACC ( I ) /DFLO AT ( IREP ) 
AAVE=AAVE+ACC(I) 
ASQ=ASQ+ACC(I)*ACC(I) 
70 CONTINUE 

A AVE= AAVE/DFLO AT ( I SAMPL+ 1 ) 
ASQ=ASQ/DFLOAT( ISAMPL+1) 

AERR=DSQRT( (ASQ-AAVE*AAVE)/DFLOAT(ISAMPL) ) 
C OUTPUT 

WRITE(6,100)NSIZE*IREP, ISAMPL+1, NSIZE*IREP* (ISAMPL+1) 
100 FORMAT ( ' TRIAL= , ,I8,' SAMPLE=' , 16 , ' TOTAL TRIAL= 110) 

WRITE(6 , 110) IRBIT , 2 . OdO** (-IRBIT) 
110 FORMAT ( ' BIT WIDTH=' ,12, ' RNG RESOLUTION^ .F10.8) 

WRITE ( 6 , 120 ) AAVE , AERR 
120 FORMAT ( ' ESTIMATED VOLUME=' ,F10. 8, '+-' ,F10. 8) 

STOP 

END 

FUNCTION SPHAMC (N , IR1 , IR2 , IR3 , IR4 , IR5 , IRBIT) 
IMPLICIT REAL* 8 (A-H.O-Z) 

DIMENSION IR1(N) ,IR2(N) , IR3(N) , IR4(N) , IR5(N) 
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AN0RMS=2 . ODO**IRBIT 

ANORMS=ANQRMS*ANORMS 

ICNT=0 

DO 10 1=1, N 

A1=DFL0AT(IR1 (I) ) 

A2=DFL0AT ( IR2 ( I ) ) 

A3=DFL0 AT ( IR3 ( I ) ) 

A4=DFL0 AT ( IR4 ( I ) ) 

A5=DFL0AT ( IR5 ( I ) ) 

A=A1*A1+A2*A2+A3*A3+A4*A4+A5*A5 

IF ( A . LT . ANORMS ) ICNT=ICNT+ 1 
10 CONTINUE 

SPHAMC=DFLOAT (ICNT) *32 . ODO/DFLOAT (N) 

RETURN 

END 

FUNCTION SPHAMD (N, IR1 , IR2, IR3, IR4, IR5 , IRBIT.M, IX) 
IMPLICIT REAL* 8 (A-H.O-Z) 

DIMENSION IR1 (N) , IR2 (N) , IR3 (N) , IR4 (N) , IR5 (N) 

DIMENSION IX(0:M-1) 

AN0RMS=2 . ODO**IRBIT 

ANORMS=ANORMS* ANORMS 

ICNT=0 

DO 10 1=1, N 

A1=DFL0AT(IX(IR1 (I) ) ) 

A2=DFL0AT ( IX ( IR2 ( I ) ) ) 

A3=DFL0AT ( IX ( IR3 ( I ) ) ) 

A4=DFL0AT ( IX ( IR4 ( I ) ) ) 

A5=DFL0AT ( IX ( IR5 ( I ) ) ) 

A=A1*A1+A2*A2+A3*A3+A4*A4+A5*A5 

IF ( A . LT . ANORMS ) ICNT=ICNT+ 1 
10 CONTINUE 

SPHAMD=DFLOAT (ICNT) *32 . ODO/DFLOAT (N) 

RETURN 

END 

Random Number Generation Routine 

The random number generation routine used in the above two programs is given. 
The Tausworthe sequence (or Kirkpatrick and Stoll method) is used to generate the 
pseudo-random integers. 

SUBROUTINE INILSD(N) 

COMMON /LCSEED/NSEED 

NSEED=N 

RETURN 

END 

FUNCTION LCARNGO 
INTEGER*4 LCARNG 
DATA MASK31/Z' 7FFFFFFF ' / 
COMMON /LCSEED/NSEED 
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NSEED=NSEED*48828125 

NSEED=I AND (NSEED , MASK31 ) 

LCARNG=NSEED 

RETURN 

END 

SUBROUTINE INITTW(IRSEED, IRBIT) 
PARAMETER (ISELBT=-24) 

COMMON /TWST0R/IDIM(250) ,IPL(250) ,IQL(250) ,IP 
DATA IRCNST/Z ' 7FFFFFFF ' / 
CALL INILSD(IRSEED) 
DO 10 1=1,250 
IDIM(I)=0 
DO 20 J=l,32 
ITMP=LCARNG() 
ITMP=ISHFT (ITMP , ISELBT) 
ITMP=IAND(ITMP,1) 
IDIM(I)=ISHFT(IDIM(I) ,1) 
IDIM(I)=IOR(IDIM(I) ,ITMP) 
20 CONTINUE 
10 CONTINUE 
C BIT MASKING 
IRLST=0 

DO 40 1=1, IRBIT 

IRLST=I SHFT ( IRLST , 1 ) 
IRLST=I0R(IRLST,1) 
40 CONTINUE 

DO 30 1=1,250 

IDIM(I)=IAND(IDIM(I) , IRLST) 
30 CONTINUE 
IP=1 

DO 50 1=1,250 
IPL(I)=I+1 
IT=I-103 

IFCIT.LT. l)IT=IT+250 

IQL(I)=IT 
50 CONTINUE 

IPL(250)=1 

RETURN 

END 

FUNCTION IRNDTWO 

COMMON /TWST0R/IR(250) , IPL(250) , IQL(250) ,IP 

IRNDTW=IEOR(IR(IP) , IR(IQL(IP) ) ) 

IR(IP)=IRNDTW 

IP=IPL(IP) 

RETURN 

END 

SUBROUTINE ITWDIM(N.IDIM) 

COMMON /TWST0R/IR(250) ,IPL(250) ,IQL(250) ,IP 
DIMENSION IDIM(N) 
DO 10 1=1, N 
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IDIM(I)=IRNDTW() 
10 CONTINUE 
RETURN 
END 



Appendix B— Shuffling of Boltzmann-factor Table 

The shuffling routine for Boltzmann factor tables (BFT) of Ising Monte Carlo sim- 
ulation with independent-system coding technique is given. This BFSHUF rou- 
tine is for the simulation with two BFT and it is easily modified for more BFT 
simulation. This routine uses vectorized efficient random-number-generation rou- 
tine RND02I of RNDTIK library^) which generates Kirkpatrick-Stoll type random 
integers |27| . RNDD2I is assumed to be initialized to generate IRBIT-bit wide random 
integers. 

C 1992.10.26. BY NOBUYASU ITO 

C BOLTZMANN FACTOR TABLE SHUFFLING ROUTINE 

C 1X1, 1X2 : BOLTZMANN FACTOR TABLE TO BE SHUFFLED 

C IRBIT : BIT-WIDTH OF THE RANDOM NUMBER SHOULD BE GIVEN 

C THE SIZE OF 1X1 AND 1X2 IS ASSUMED TO BE 

C 1X1(0 :2**IRBIT-1) 1X2(0 : 2**IRBIT-1) . 

C IRD(N) : WORK AREA. ANY SIZE N IS GOOD BUT LARGE VALUE 

C (MORE THAN SEVERAL THOUSANDS) IS MORE EFFICIENT. 

SUBROUTINE BFSHUF (1X1 , 1X2 , IRBIT , N , IRD) 

PARAMETER ( IWIDTH=64 , NMAX=30) 

DIMENSION 1X1(0 :2**IRBIT-1) , 1X2(0 : 2**IRBIT-1) 

DIMENSION IRD(N) 

IRP0S=1 

CALL RND02I(N,IRD) 
DO 10 I=0,IWIDTH-2 
IP0S=ISHFT(1,I) 
DO 20 J=l, IRBIT 
IN=2**(IRBIT-J) 
IS=1-J 

DO 30 K=IN*2-1,IN,-1 
DO 40 L=1,NMAX 

IR=ISHFT(IRD(IRPOS) ,IS) 
IRP0S=IRP0S+1 
IF(IRPOS.GT.N)THEN 
CALL RND02I(N,IRD) 
IRP0S=1 
END IF 

IF(IR.LE.K)G0T0 50 
40 CONTINUE 

WRITE(*,*) 'ERROR IN BFSHUF! RANDOM NUMBER IS BIASED.' 
STOP 

50 ID=IE0R(IX1(K) ,IX1(IR)) 

ID=IAND(ID,IPOS) 
IX1(K)=IE0R(IX1(K) ,ID) 
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IX1(IR)=IEQR(IX1(IR) ,ID) 
ID=IE0R(IX2(K) ,IX2(IR)) 
ID=IAND(ID,IPOS) 
IX2(K)=IEDR(IX2(K) ,ID) 
IX2(IR)=IE0R(IX2(IR) ,ID) 



30 
20 
10 



CONTINUE 

RETURN 

END 



CONTINUE 
CONTINUE 
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